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The fofmol evoluotion of educotlonol programs is a relatively recent pheno- 
menon, and Educational Testing Service research scientists have been 
among those striving to chart the unknown waters during the post 15 
years. This report is an attempt to record our experiences ond the Insights 
we hove gained. Qeorly, ETS has not done it all; we ha>e learned much 
from the efforts of others in the educotlonol community as well os from our 
ov^n endeovors. 

It is fitting that the report bears the name of Samuel Qoll, for he was 
one of our most octive program evoluotors for 1 0 years and directed sever- 
al pocesetting studies. He resigned his position as a Senior Reseorch Psy- 
chologist in 1978 to occept a choir in educotion at the University of Sydney 
in his native Australia, 

Dr, Doll still collaborates with us on occosionol projects, ond it was dur- 
ing o visit earlier this year that he put ihe finishing touches to this report. We 
publish it now in the hope thot what we hove learned about program evol- 
uotion will be of volue to others in educotion. 

Somuel J. Messick 

Vice President for Research 

June 1 , 1 979 



^fi Emerging Profession 



Evaluoting educational pfograms is on emerging profession, one Educo- 
tionol Testing Service hos played on octive role in its development over the 
post 1 5 years. The term "^rogrofp evoluotion" only come into wide use in 
the mid-60s. when efforts at systematically ossessing progroms multiplied. 
The purpose of this hind of evaluation is to provide Information to decision- 
mokers who hove responsibility for existing or proposed educotionol pro- 
groms ^or instance, program evoluotion may be used to help moke deci- 
sions concerning whether to develop o progrom (riMKis osstfssment ). how 
best to develop o progrom (formative evoluotion). and whether to 
modify— or even continue— on existing program (sunmotive evoluotion). 

N#«d$ os#s»n#nt is the process by which one Identifies needs and de- 
cides upon priorities among them. Formotlve evoluotion refers to the pro- 
cess involved when the evoluotor helps the progrom developer— by pre- 
testing program materials, for exompie. Sunwratlve evoluotion is the evol- 
uotion of the program after it is In operation. Arguments ore rife among 
progrom evoluotors about whot kinds of informotion should be provided in 
eoch of these forms of evoluotion. 

In general the ET5 posture has been to try tc* obtain the best— that is. 
the most relevant, valid, and reliable— information that con be obtained 
within the constroints of cost and time and the needs of the various au- 
diences for the evaluation. Sometimes, this means o tight experimentol 
design with o notionol somple; at other times, the best informotion might 
be obtoned through on intensive cose study of o single institution. ETS has 
carried out both traditional and innovative evaluations of both troditionol 
ond innovative programs, end staff membeis also hove cooperoted with 
other institutions in planning or executing some ospects of evoluotion 
studies. Along the way, the work by ETS hos helped to develop new view- 
points. techniqOes, and skills. 

. fhe Range of ETS Program 
Evaluation Activities 

Program evoluotion coJIs for o wide ronge of skills, ond evoluotors come 
from o variety of discipliries: «Jucotional psychology, developmental psy- 
chology, psychometrtcs, sociology, statistics, onthropolc^y, educotionol 
odministrotlon, ond o host of subject-motter oreos. As prc^rom evoluotion 
began to emerge os o professionol concern in these fields , so ETS changed, 
both structurolly ond functlonolly. The structural changes were not exclu- 
siveSy tuned to the needs of conducting progrom evoluotlons. Rather, pro- 
gram evaluation, like the teoching of Ef\gllsh in o well-fun high sch<x>l. be- 
come to some degree the concern <^.viftuoliy olf the (xofessiono! stoff. 
Thus, new reseorch groups were odded, and they augmented the orgoni- 
zotlon's copobillty to conduct program evaluations. 



The functionol response was mony-focetecf . Two of the earliest evalua- 
tion studies conducted by ETS indicate the breadth of the range of interesc 
In 1965 collaborating with the Pennsylvonio Stote Department of Educa- 
tion Heniy Dyer of ETS set oMt to establish a se of educational goals 
cqoinst which later the perfofmonce of the state s educotionol system 
cculd be evoluott J A unique ospect of this endeovor wos Dyer s insistence 
that the goal-setting process be opened up to strong porticipotion by the 
stote s citizens ond not left solely to o professional or poHticol elite. (In foct 
ETS program evaluation has been morlsed by o strong emphasis when ot 
c'l appropriate on obtaining community participation . ) 

The other early evaluation study m which ETS was involved wos the 
now famous Coleman Report (Eqgollty of Educotionol Opportunity) 
issued in 1966 ETS staff under the direction of Albert E. Deoton, hod major 
responsibility for design of the study and analysis of the massive data gen- 
erated. Until then, studies of the effectiveness of the notion s schools, espe- 
cially with respect to programs' educotionol impact on minorities, hod 
been small-scale. So the collection and onolysis of doto concerning tens of 
thousands of students and hundreds of schools and their communities were 
new experiences for ETS and for the profession of progrom evaluation. 

In the intervening yeors, the Coleman Keport and the Pennsylvania 
Goals Study hove become classics of their kind, nnd from these two auspi- 
cious ecrty efforts ETS has become o center of i :ajor progrom evaluation 
Areos of extensive endeovor been, and ore, diverse. They include 
computer-oided instruction, oesthetics and creativity in education, educo 
tionol television, educotionol programs for prison inmates, reoding pro- 
grams, camping programs, career educotion, bilingual education, higher 
education preschool programs, special education, and drug programs. 
(For brief descriptions of ETS work m these areas see Appendix A,) ETS also 
has evaluated programs reloting to yeor-round schooling, English as o sec- 
ond longuoge desegregation, performance contracting, women s educa- 
tion, busing Title I of the Elementary and Secondary Act (ESEA), account- 
ability and basic information systems. 

One piece of work which must be mentioned is the Encyclopedic of 
Educotionol Evoluojion, published in 1975 by Jossey- Boss f\ibiishers, Inc. it 
wos edited by Scorvio D Anderson, Somuel Boil, and l^ichord T. Murphy, 
and contoins orticles by them and 36 other members of the ETS staff. Subti- 
tled Concepts ond Techniques for Evaluating Educotion ond Trolning Pro- 
grams it contains 141 articles in oil. 



^TS Contrlbytiofis to- ^ 
pfogreim EvQipotion 

Given the innovotivenes* of many of the programs evoiutoted, the new- 
ness of the profession of program evoluotion. and the level of expertise of 
the ETS staff who hove directed these studies, It is not surprising thot the 
evoluotions themselves hove been morked by innovotions for the profes- 
sion of progrom evoluotion. At the some time. ETS has adopted several 
principles relotive to eoch ospect of prcjgrom evoluotion. It will be useful to 
examine these innovotions ond principles in terms of the phoses that o pro- 
grom evoluotion usually ottends to~gool setting, meosurement selection, 
implemeniotlon in the field setting, onolysis, ond interpretotion ond 
presentotion of evidence. 
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Making Goals Explkit 

It would be a pteosure to report thoi virtuoHy every educotlonol pfogrom 
has o well'thought-through set of gobls. but it is not so. It is, therefore, nec- 
essary Qt times ^of program evaluotors to help verbalize ond dortfy the 
goals of a progrom to ensure thot they ore. at leost. explicit. Further, the 
evoluoto'' moy even be given gool development as o primory tosW, os in 
the Pennsylvomo Goals Study, This was seen ogoin in o similor progrom. 
when Robert Feldmesser. in 1973. helped the New Jersey Stote Boord of 
Educotion establish gools that underwrite conceptually thot stote's "thor- 
ough ond efficient ' education progrom 

Work by ETS staff indicotes there r.re four importont principles with re- 
spect to program goal development and explication. The first of these prin- 
ciples 

What pfog-om developers soy their progrom goois ore moy bear 
only o possing fesemblonce to what the progrom In fact seems 
to be doing. 

fhis principle— the occosionol surreolistic quality of program goals— hos 
been noted on o number of occasions, for example, assessment ifistru- 
ments developed for a program evaluation on the basis of the stoted^ools 
sometimes do not seem at oil sensitive to the actual curriculum As o result, 
ETS program evaluotors seek, whenever possible, to cooperote with pro- 
gram developers to help foshton the goals stotement. The evaluotors olso 
will attempt to descri^ the program in operotion and relate that oescrip- 
lion to the stored gdots, as in the cose of the 1971 evaluation of the sec- 
ond yeor of Sesame Street for Children's Television Workshop by Gerry Ann 
Dogotz and Somuei Doll This comparison is on important port of the process 
ond represents sometimes crucial information for decision-makers con- 
cerned With developing or modifying o program. 

Tho second principle: 

When progrom evotuotors work cooperotlvely with developers 
In mokInQ progrom goals explicit, both the program and the 
^^voluotion seem to benefit. 

The onginol Sesome Street evaluation in 1970 exemplified the useful- 
ness this cooperation. At the earliest planning sessions for the program, 
before it hod o nome and before it was fully funded, the developers, aided 
by ETS hommered out the program gools. Thus, ETS was ob!*? to learn ot 
the outset what the progrom developers hod in mind, ensuring sufficient ^ 
time to provide adequately developed meosurement instruments. If the 
evoluotion team hod hod to wait until the program itself was developed, 
there would not hove been sufficient time to develop the instruments; 
more important the evoluotors might not hove hod sufficient understond- 
ing of the intended goals— thereby making sensible evaluation unlikely. 

The third principle: 

There is often o greot deol of empirlcol reseorch to be conducted 
before program goois con be specff ted. 

Sometimes, even before gools con be estobfished or o program devel- 
oped It IS necessary, through empincol research, to indicate thot there is a 
need (Dr the progrom. An illustration is provided by the 1976 reseorch of 
Ruth Ekstrom and Morloine Lockheed into the competencies goined by 
women through volunteer work and homemoking. The ETS reseorchers 
orgued thot it is desiroble for women to resume their educotion if they wish 
to ofter yeors of absence But what competencies hove they picked up in 
the interim ti lOt might be worthy of ocodemic credit? By identifying, survey- 
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ing. ond ^nteiviewing women who wished to return to formr»| educoiion. 
Ekstfom ond Lockheed estoblished thai mony hod indeed leorned vqIu- 
oble skills ond Knowledge. Colleges were alerted ond some hove begun 
to give credit where credit is due. 

Similarly when the federal government decided to moke o concerted 
attack on the reading problem as it affects the total population, one area 
•>f concern was odult reoding But there was htfle knowledge about it Wa^ 
there on odult Ijteracy problem? Could adults read with sufficient under- 
standing such iifcms OS newspaper employment odvertisements, shopping 
and movie odvertisements and bus schedules? And in investigating oduli 
literacy, what chorocierized the reading tasks ihot should be token into oc- 
count^ Murphy in o 1973 study, considered these foctors: the .t^ftonce 
of o task (the need to be oble to reod the motefiol if only once a year as 
with income tax forms and instructions) the Intensity of the task (o persofi 
who wonts to work in the shipping department w^ll hove'to read the ship- 
ping schedule each doy) or the extenslvlty of the task £70 percent of the 
adult population reod a newspaper but it con usually b*^ ignored without 
gross problems orising) Murphy and other ETS researchers conducted sur- 
veys of reading hobits and obilities, ond this assessment of needs provided 
the government with mformotlon needed to decide on goals and develop 
appropriate progroms 

Still o different kind of needs assessment was conducted by ETS re- 
searchers with respect to o school for learning disabled students in 1976. 
The school cotered to children aged 5-16 and hod four separate programs 
and Sites. ETS first served as o cotolysi, helping the school's staff develop o 
listing of problems Then ETS acted as on amicus curiae, drawing attention 
to those problems making explicit and public what might hove been un- 
said for wont of on appropriate forum. Solving these problems was the pur- 
pose of stating new institutional goals— goals that might never hove been 
formally recognized if ETS hod not worked with the school to moke its 
needs explicit 

The fourth principle: 

Tfw prog/om evoluotor should be conscious of ond interested in 
the unfntended outcomes of progroms os well os the Intended 
outcc^es specified In the progrom's gc^l stotement* 

In program evoluotion, the importance of looking for side effects es- 
pecially negative ones, hos to be considered ogomst the need to put o 
major effort into assessing progress toward Intended outcomes Often, in 
this phase of evaluation, the varying interests of evoluators, developers, 
and funders intersect— and pfofessionol, financial, and politicol considera- 
tions ore oil at odds At such times, program evoluot^on becomes os much 
on ortform os on exercise in social science. 

A number of articles hos been written dbout this problem by Somue! J. 
MessicSv ETS vice president for research. His viewpoint— the imoortonce of 
the medicol model —hcs been illustroted in vorious ETS evaluation stud- 
ies His mojo< thesis is thoi tfie medical model of program evoluotion expli- 
citly recognizes that * . . prescriptions for treotment ond the evc.iuotion of 
their effectiveness should toke into account not only reported symptoms 
but other characteristics of the orgonism ond its ecology os well" (Messick. 
1973 p. 245). As Messick goes on to point out. this is o coll for o systems 
onolysis opprooch to progrom evoluofion— deoling empirically with the in- 
terrelotedness of oil the foctors and monitoring oil outcomes, not just the 
intended ones 

When, for example. ETS evaluated the first two years of Sesame 
Street, there wos obviously pressure to ascertain whether the intended 
goals of that show were being ottoined It wos nonetheless possible to 
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look for some of the more lihely unintended outconnes: whether the show 
hod negotive effects on heovy viewers going off to Wndergoften, and 
whether the show was ochieving impocts in ottltudino! oreos. 

In summotive evoluotions. to study unintended outcomes is bound to 
cost more money thon to ignore them It is often difficult to secure in- 
creased funding for this "purpose For educotionol programs with potential 
notional opplicotions however. ETS strongly supports this more compre- 
hensive Qpprooch 

Measuring Program Impact ' 

The letters ETS hove become olmost synonymous in some circles with 
stondofdized testing of student ochievem^eni. In its program evaluations, 
ETS notufolly uses such tests os appropriate, but frequently the standard- 
ized tests or^ not oppropriote meosures. In some evaluations, ETS uses 
both standof6Jized and domain- referenced tests, An example may be 
seen m Th# Electric Corr^ony evaluations of 1973 and 1974 (Ball, i3ogotz. 
K M Kozorow Donald D. IXubin). This televised senes. wnich was intended 
to teach reoding sKills to first through fourth graders, wos evaluated in 
some 600 classrooms One question that was asked during the process 
concerned the intefoction of the student's level of reoding ottoinment and 
the effectiveness of viewing the series, Do "good" readers leorn more from 
the series than poor ceoders^ So standardized, norm- referenced reading 
tests were administered, and the students in each grade were divided into 
denies on this basis, thereby yielding 1 0 levels of reoding attainment. 

Data on the outcomes using the domain-referenced tests were subse- 
quently analyzed for each decile ranking. Thus. ETS wos able to specify for 
whQt level of teodtng ottoinment. in each grade, the series wos working 
best This kind of conclusion would not hove been possible if a specially de- 
signed domain-referenced reodind test with no external referent hod been 
the only one used nrr if o stondordized test, not sensitive to th3 program's 
impoct hod been the only one used. 

Without denying the usefulness of previously designed ond developed 
measures ETS evoiuotors have frequently preferred to develop or odopt in- 
struments thot would be specif icolly sensitive to the tosks at hand. Some* 
times this meosurement effort is carried out in onticipotton of the needs of 
program evoluotors for o particular instrument, and sometimes because o 
current program evoiuatton requires immediate instrumentotion. 

An exomple of the former is the 1976 study of doctorot programs by 
Mory Jo Clark, Rodney T Hortnett. and Leonard L Doird. Existing instru- 
ments hod been based on surveys in which practitioners in o given disci- 
phne were asked to rnte the quality of doctoral programs in that discipline. 
Instead of this reputotionol survey" opprooch, the ETS team developed on 
array of criteria (e g., faculty quality, student bcdy quolity, resources, aca- 
demic offerings alumni performance), oil open to objective assessment. 
This new assessment tool con now be used to assess changes in the quolity 
of the doctoral progroms offered by major universities. 

Similarly the 1 976 development by ETS of the Kit of Reference Tests for 
Cognitive Factors (Ekstrom. John French, Horry H. Hormon) also provided a 
toDi— one that could be used when evaluating the cognitive structures of 
teachers or students if these structures were of interest in o particular evalu- 
ation. A clearly useful applicotion wos in the California study of teaching 
performance, also in 1976. by Frederick McDonold and Patricio Elios. 
Teachers with certion kinds of cognitive structures were seen to hove differ- 
entiol impacts on student achievement. In the Trismen study of the 
aesthetics progrom previously referred to. the foctor kit wos used to see 
whether cognitive structures interacted with aesthetic judgments. 

y0 



0ffv#lopln9 special InstfUfiwnls. Examples of the development of specific 
instfumeniQtion for ETS pfogrom evaluations ore numerous. Virtually every 
pfogram evaluation involves ot the very least some odopting of existing 
imtfuments For example a questtonnone or interview may be adopted 
from ones developed for earlier studies Typically, however, new instru- 
ments including gool-speciftc tests, ore prepared Some ingenious exam- 
ples, based on the 1966 work of E J Vebb. D. T. Campbell, fK. D. 
Schwartz, and L. Sechrest were suggested by Anderson for evoluoting mu- 
seum programs and the title of her 1968 article gives a flavor of the unob- 
trusive,rneosures illuscroted— Nosepnnts on the Gloss. 

Another example of ingenuity is Donald A Trismen s use of 35mm 
slides as stimuli in the assessment battery^ of the Education through Vision 
program Each slide presented on ort mosterpiece. and the iesfX)nse op- 
tions were four obstroct designs vorying in color. The instruction to the stu- 
dent was to pick the design that best illustrotea the masterpiece s coloring. 
Using multiple nwosures. When ETS evoluotors hove to assess o variable 
and the usual meosures hove rather high levels of error inherent in them 
they usually resort to triongulotion. ' That is, they use multiple measures of 
the some construct knowing that eoch meosure suffers from a specific^ 
weakness Thus in 1975. Donald E. Powers evaluated for the Philadelphia 
school system the impact of duol-oudio television— o television show tele- 
cost at the some time os o designated FM radio station provided on oppro- 
priote educational commentary. One problem in measurement was 
assessing the omount of contact the student hod with the dual-audio tele- 
vision treatment Powers used home telephone interviews, student ques- 
tionnoires and very simple knowledge tests of the chorocters in the shows 
to assess whether students hod in fact been exposed to the treotment. 
Eoch of these three meosures has problems associated with it, but rhe 
combination provided a useful assessment index 

In some circumstances, ETS evoluotors ore able to develop meosure- 
ment techniques that ore on integral port of the treatment itself. This unob- 
trusiveness hos clear benefits and is most reddily ottoinoble with computer- 
aided instructional (CAD progroms Thus, for example, Donald L, Atdermon, 
in the evQ^uotion of TlCCIT (o CAI progrom developed by the Mitre Corpora- 
tion) obtained for each student such indices as the numbe: of lessons 
possed the time spent on line the number of errors mode, and the kinds 
of errors And he did this simply by programming the computer to save this 
information over given periods of time 

Working in Field Settings 

Meosurement problems cannot be addressed sotisfactonly if the setting in 
which the measures ore to be administered is ignored One of the clecr les- 
sons learned m ETS progrom evaluation studies is that meosurement in field 
settings (home school, community) poses different problems from mea- 
surement conducted in o laboratory. 

Program evoluotion, wheth r formative or summotive. demands that 
Its empiricol elements usually be conducted in noturol field settings rather 
than in more contrived settings, such as o loborotor/ Nonetheless, the 
problems of working in field settings ore rarely systematically discussed or 
researched In 1975 in on article in the Encycfopedio of Educotipftof Evolu- 
Qtlon. (Jogoti detailed these major ospects: 

• Obtaining permission to collect data at o site 

• Selecting o field stpff 

• Training the staff 

• Maintaining fomily/ community support. 



Of course oil the aspects discussed by Qogotz interQCt with the meo- 
j^urement end design of the pfogrom evraluaticn A greot source of infofmO' 
non concerning field operations is the ETS Head Stort Longitudtnol Study of 
Disodvontaged Children directed by Virginio Shipmon Although not pri- 
monly a program evaluation it certOfniy hos generoted- irpplications for 
early childhood progronns It wos longitudtnol, comprehensive in scope, 
and forge to size encompasing four sites and, initially, some 2 000 pre- 
schoolers It was cleor from the outset thot close community ties were es-. 
sentiQl it only for expediency— olthough. of course, more important ethicol 
pnnciple:^ were involved This close relotionship with the communities in 
w hich the study ♦✓os conducted involved using local residents as supervisors 
and testers establishing local advisory committees, and thus ensuring free, 
two woy commu'^ication between the reseorch team and the corrvnunity 

Ihe Sesome Street evaluation olso odopted this opprooch. In port be- 
cause of time pressures and in port to ensure valid test results, the ETS evoh 
uo^ors especially developed the tests so ihot community members with 
mimmol edurotionol attainments could be troined quickly to administer 
them with proper sKill 

Establishing community rapport. In evaluations of street academies by 
IXofitJid L Rougher and of educct'on progroms in prisons by Flougher and 
, Somue! Dornett it was argue j that one of the most important elements in 
successful field relationships is the time on evoluotor spends getting to 
hnow the interests end concerns of vorious gioups. and lowering borriers^of 
- suspicion thot frequently seporote the educated evoluotor' and the tess- 
educoted program porticiponts. This may not seem o porticulorly sophisti- 
coted or complex point but many program evoluotions hove floundered 
because of on evoluotor's lock of regard for disadvantaged communities. 
Therefore o firnn principle underlying ETS program evoluotion is to be con- 
cerned with the* communities that provide the contexts for the programs 
being evoluoted, Estoblrshing two-way lines of communicotion with these 
communities ond usin 3 community resources whenever possible help en- 
sure o valid evoluotion 

Even with the b^st possible community support, field setiy[ngs cause 
problems for measurement. fXoymond G. Wosdyke and Jeriiee Grondy 
showed this to be true in o 1976 evaluation when the field setting was lit- 
erally that— o field setting, in studying the impact of o cbmping progrom 
on New York City grade school pupils, they recognized the need, common 
to most evoluotions. to describe the treatment— in this cose the camping 
experienceTtjecefore, ETS sent on observer to the compsite with the treat- 
ment grocJps This person, who was herself skilled In camping, ^monoged 
not to be on obtnjsive porticipont by mointoining o reldtively low profil^^-. 

Of course, the problems of the observer con be just as difficult in formal 
institutions OS on the campground. In their 1974 evoluotion of Open Unl- 
versity moteriols, Hortnett, Clork, Feldmesser, et ol. found, os hove pro- 
gram evoluotors in almost every situotion, thot there wos some defensive- 
ness in ^ch of the institutions where they^worked. Doth personal dnd pro- 
fessionpl contacts were used to oiloy suspicions. There also was ^mphosis 
on on evoluotion design thot took into occount each institutior/s values. 
That port of the evaluation wos specific to the institution, but some com- 
mon elements across institutions were retained. This strotegy underscored 
the evoluQtOfs' reolizotlon thot each institution was different, but ollowed 
ETS to study certion voriobles across gll three participating institutions. 

Breaking down the borriers in o field setting Is one of the Importont 
Clements of o successful evoiuotion, yet eoch situation dernonds somey 
whot different evoluotor responses, 

tnvoivif^ progrom stoff . Another way of ensuring thot evoiuotion field sU^ff 




Qfe occepted by prcgrom staff is to mohe the program staff active partici- 
pant* in the evoiuotion process While this is c'wiously o technique to be 
strongly recommended in formative evoluations. it eon also be used in 
summotive evoluotions In his 197 / evoiuotion of PLATO in junior colleges. 
Murphy couid not afford to become the victim of o program developer' s 
feor ot on insensitive e -oluotor. He overcome this poteruiol problem by en- 
listing ft e active pari.cipotion of the junior college and progrom develop- 
ment s:affs One of Murphy s concerns was that there is no common couise 
ocfOis colleges introduction to Psychology, for exan->ple. might be taught 
vi.tuolly everywhere, but the content con change remarkably, depending 
on such factors as who teaches the course, where it is tought, and what 
texi IS u>ed Murphy understood this vortobiiity and his evaluation of PLATO 
reflected his concern. It also necessitated corsideroble input ond coopera- 
tion from program developers and college teochers working in concert— 
with Murphy octing as the conductor 

Anolyzh'^ th« Data 

After the principles and stratp lies used by progrom evaluotors in their field 
operotions ore successful ono doto ore obtained, there remains the impor- 
tant phase of doto anolysis. In proctice, of course, the program evoluotor 
thinks though the question of date analysis before entering the doto col- 
lection phase Rons for analysis help determine what meosures to devel- 
op what doto to collect, ond even, to some extent, how the field opero- 
tion IS to be conducted. Nonetheless, analysis plans drown up early in the 
progrom evaluation cannot remoin quite os immutable as the Mosaic Low. 
To illustrate the need for flexibility, it is useful to turn once ogoin to the heur- 
istic FTS evaluation of Sesome Street. 

As initially plonned, the design of the Sesome Street evaluation wos o 
true experiment The onolyses coiled for were multivafiote analyses of co- 
vononce. using pretest scores os the covoriote. At eoch site, o pool of eligi- 
ble preschoolers was obtoined by community census, €3«d-«xpefimental 
ond control groups were formed by rondom assignment from these pools. 
The evaluotors were somewhot concerned thot those designoted :o be 
the experimental (viewing) group might not view— it was o new show on 
public television o loose network of TV stotions not noted for high viewer- 
ship Some members of the S^Kime Street notionol reseorch odvisory com- 
mittee counseled ETS to consider paying the experimentol group to view. 
The suggestion was resisted, however, because ony efforts obovel mild 
ond occosionol verbol encourogement to view the show would con^pro- 
mise the results If the experimental group m^befs were paid, and if they 
then viewed extensrvely ond outperformed the control group at posttest. 
would the improved performance be due to the viewing, the payment, or 
some interaction of payment and viewing? Of course, this nice argument 
proved to be not much more than on exercise in modern scholostidsm. In 
foct, the problem loy not in the treotment group but in the uninformed ond 
unencouroged-to-view cor^irol group. The members of thot group, as in- 
deed preschoolers with occess to public television throughout the notion 
were viewing the show with consideroble frequency— ond not much less 
then the experimental group. Thus, the plonned onolysis involving differ- 
ences in posttest ottoinments between the two groups wos deolt o mortal 
blow^ 

•Fortunately, other onolyses were ovolloble, of which the ETS-refined 
Age Cohorts Design provided o rotlonol bosis. Tnis design Is presented in 
the relevant report (Boll and Bogotz, 1970). The ne^d here Is not to de- 
scribe ttie new design ond onolysis but to emphasize o point mode procti- 



CQify by Robert Dun s some time ago and repealed here more prosoicQliy; 
The best la«d plans of evolyacds can gong ogley. too. 

CI#Qring n#w paths. Sometimes program evoruotors find that the design 
and analysis they hove 'n mind represent an untrodden poth. This Is per- 
hops in port because mony of the designs in the sociol sciences ore built 
upon laboratory ccnditions ond simply ore not partlculorly relevant to whot 
hoppens In educotional institutions. 

When ETS designed the summative evaluation of Th« Electric 
Company, it wos able to iet up c taie experiment in the schools. Pairs of 
comporoble clossrooms v/lthin o school and wrthin a grade were destg- 
noted OS the pool with wl^ich to work. One of each pa*'' of dosses wos ron- 
domly ossigned to view the series. Pretest scores were used as covariotes 
on posttesi scores, and in 1973 the first-year evaluation analysis was suc- 
cessfully ccrred out (&all and Pogotz). The evoluotion was continued 
through o secoi'd year, however, and as is usual in schools, the cla:ises did 
not remain into :t. 

From an initial 200 dosses, the children hod scattered through many 
more dossrooms. Virtually none of the classes with subject children con- 
tained only expeimentol or oniy control children frc m the previous year. 
Donold D, Rubin, on FS stotlsticion. consulted with c voriety of authorities 
ond found that the design and onoly^is problem for tfie second year of the 
evaluation hod not been oddressed in previous woik. To summarize the 
solution decided on. the new pool of classes was reassigned randomly to E 
(expenmentol) or C (control) conditions so «tiot over the two yeors the de- 
sign wos portroyoble as. 




E— 

C 

Pre Post Pf« Post 

YEAR i YEAR U 

Hott< Fo/ Yeof li. Ei" repfesents chtldren who were in E ctossfcx^ns in Year I and 
ogoin \n Year H Thcif fS, th«? first letter refers to siotus in Yeor \ orni the second to 
stoius in Yeor II 

Further, the pretest scores of Year if were usable os new covoriotes when 
onolyzing the results cf the Yeor !i posttest scores. 

Tailoring to the task. Unfortunately for those who prefe^ routine proce- 
dures, It hos been shown across o wide ronge of ETS progrom evaluations 
thot eoch design ond analysis must be tailored to the occasion. Thus, Gory 
Marco, OS port of the 1972 statewide educotionol assessment in .vMchigon, 
evcluoted E5EA Title I program performonce. He ossessed the amount of 
exposure students hod to vorlous clusters of Title I progroms, ond he in- 
eluded control schools in t^.e onolysis. He found that a regression analysis 
model involving o correction for measurement error was on innovative 
move thot best fit his complex configuration of data. 

Garhe Forehand, Morjorie Rogosto, and Donold A. lAock, in o large- 
scole notional, cofrelotlonol study of desegregotion completed in 1976, 
obtoiniKJ data on sct^ol characteristics ond on student outcomes. The pur- 
poses of the study induded defining indicators of effective desegregation 
ond discriminating between more and less effective school desegregation 
programs. The emphasis throughout the c^ffort was on variobles that were 
monipuioble. That is, the ideo was that woluators would be able to sug- 
gest practical advice on whot schools con do to ochieve o productive de- 
segregation progrom. Initial investigations allowed specif Icotion among 
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the myriad variables of a hypothesized set of causal felotlonsNps, end the 
use of poth onotysis mode possible esiimotion of the strength of hypothe- 
sized CQusQl relationships. On the bosis of the initial correlotionol motrices 
the poth onolyses. and the obsetvotlor.s mode during the study on impof- 
tont product— o nontechnical handbook fc ' use in schools— wos de- 
veloped 

Another large-scale ETS evoluotion effort wos directed by Trismen 
M A. Woiler, and Gito Wilder, They studied compensatory reading pro- 
grams, injtiolly surveying more thon 700 schools across the country Over o 
four-year period ending in 1976. this evoluotion interspersed data analysis 
with new dota collection efforts One purpose was to find schools thot pro- 
vided exceptionolly positive or negative program results. These schools 
were visited blind' and obsen/ed by ETS staff, vVhereos the Forehond eval- 
uation analysis wos geor^ to obtoining practical oppllcotions, the equally 
extensive evoluotion onolysis of Trismen's study was aimed at generating 
hypotheses to be tested in o series of smaller experiments . 

As o further illustration of the complex interreloiionship among evoluo- 
tion purposes, design, onolyses, ond products, there is the 1977 evalua- 
tion of the jse of PLATO in the elementary school by Spencer Swinton ond 
Morionne Amorel. They used o form of regression onolysis— as did Fore- 
hand ond Trismen. But here the regression onolyses were used differently in 
order to idenufy program effects unconfounded by leodier differences In 
this regression analysis, teochers become fixed effects, and contrasts were 
fitted for each within- teacher poir (experimentol versus control dossroom 
teachers). 

This, in turn, provides a controst to AAcDonold's 1977 evoluotion of 
West New York progroms to teoch English os o second longuoge to adults. 
In this instancy, the regression onolysis was directed toward showing which 
teoching method reloted most to gains in adult students' peiformonce. 

There is a school of thought within the evaluation profession thot de- 
sign and onolysis in program evoluotion con be mode routine. At this point 
the experience of ETS indicotes that this would be unwise. 



Int«rpr*tlfi9 th« Results 

Possibly the most importont principle in program evaluation Is that interpre- 
tations of the evoluotion s meoning— the condusions to be drown—ore 
often open to vorious nuonces. Another problem is thot the evidence on 
which the interpretotions ore based may be inconsistent. The initiol prem- 
ise of this orticle wos thot the role of program evoluotion is to provide evi- 
dence for decision-makers. Thus, one could argue thot differences in inter- 
pretotion, and inconsistencies in the evidence, ore simply problems for the 
decision-maker and not for the evoluotor. 

Out consider, for exomple, on evoluotion by Powers in 1974 ond 1975 
of o yeor- round program in o school district in Virginlo. (The long vocation 
wos staggered oround the yeor so thot schools remoined open in the sum- 
mer,) The evidence presented by Powers indicated thot the yeor-round 
school program provided o better utilizotion of physical plont ond that stu- 
dent performonce was not negatively affected. The school boord consid- 
ered this evidence as well as other conflicting evidence provided by Powers 
thot the porents' ottitudes we,e decidedly negotive. The board mode up 
Its mind, ond (not surprisingly) scotched the progrom. Clearly, however the 
decision was not up to Powers. His role wos to collect the evidence ond 
present it systemoticolly, 

K»«pfft9 th« process op«n. In general, the ETS response to conflicting evi- 
dence or varieties of nuances In interpretotlon Is to keep the evoluotion 



process and its reporting as open os possible. In this way, the volues of the 
evQiuotor. though necessorily present, are less likely to be o predominating 
infljen<:e on subsequ^t action. 

Pr<^ram evaluators do. ot times, hove the opportunity to influence 
decision- malsers by showing them thot there ore Hinds of evidence not typi- 
colly considered The Colemon Study, for exomple, showed at leost some 
dectsion-mokers that there is more to evaluating school progroms than 
counting (or colculoting) the numbers of books in librories, the omount of 
clossroom spoce per student, the student-teocher rotio, ond the ovoilobility 
of Qudtovisuol equipment. Rather, the output of the schools in terms of stu- 
dent performance was shown to be generally superior as evidence of 
schoo! progrom perfc^monce. 

Through their work, evoluotors ore also oble to educate decision- 
makers to consider the importont principle that educotlonai treatments 
moy hove positive effects for some students and negotive effects for others 
— tho^ on interoction of treatment with student should be looked for. As 
pointed out in the discussion of unintended outcomes, a system-onolysis 
approoch to progrom evoluotion— dealing empirically with the inter- 
reloiedness of oli the foctors thot moy affect performance—is to be pre- 
ferred. And this opprooch, as Messick emphasizes, "properly takes into oc- 
count those student-prc^ess-environment interactions thot produce differ- 
entiol results ' (p. 246). 

Skittling Qppfopfiot* •vid«nc0. Finally, o considerotion of the kinds of evi- 
dence ond interpretotions to be provided decision-makers leods inexor- 
obly to the reolizotion thot different kinds of evidence ore needed, de- 
pending on the decision-moker's problems ond the ovoilobility of re- 
sources. The nrK>st "scientific" e\^dence involving objective data on student 
performance con be brilliantly interpreted by an evoluotor. but it might also 
be on obominotion to a decision-maker who really needs to knew whether 
teachers' attitudes orefovoroble. 

ETS. over the post 10 yeors, has provided o great voriety of evidence. 
For o formotive evoluotion in Brevord County, Rorido, In 1970, Trismen pro- 
vided evidence that students could make intelligent choices about courses, 
in the ungraded schools, students hod considerobie freedom of choice, but 
they ond their counselors need^ considerably more evidence than in tro- 
ditionol schools about the ingredients for success in each of the ovoiloble 
courses, in 1977, Gory Echternocht helped stote and iocoi education au- 
thorities develop Title I reporting models thot included evidence on impact, 
cost, ond complionce with federal regulotlons. Forehand and McDonold 
hove been working with New York City to develop on occountobllity model 
providing constructive kinds of evidence for the city's school system. On the 
other hond, os port of on evoluotion team. Amorel Is providing, for o smoll 
experimental school in Chicago, judgmentol doto os well os reports ond 
documents bosed on the school's own records and files. And Michael 
Rosenfeld, in 1973, provided AAontgomery Township, New Jersey, with stu- 
dent, teocher, and por^nt perceptions in his evoluotion of the open closs- 
room opprooch then being tried out. 

In short, just os tests ore not valid or involld (it is the woys tests ore used 
that deserve such descriptions), so. too. evidence is not good or bod until it 
IS seen in relotion to the purpose for which it \s to be used, and in reiotion to 
Its utility to decision-mokers. 
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o?Qc^cnHaJi^i^ ' .nvolvement in progrom evoluotion hos been ot the 
Kno ^Lt.^'^''' °" occomponying concern for theoretical ond pro 
fessionol .ssues, however, practical involverr^ent would be irresoonsible 
ntLu?nT systerr^atlze Its growhg Nno^^ge 

n^.J P;S9^°T„^^o'^otion. Thus. Anderson obtoined o contFaci wItMhe 

o^ofes^lna^^^^^^ ^^^^'^^ occurr^ulated know^ge o1 

professionols frorp inside and outside ETS on the topic of progrom evQiuo 
tion. A number of products followed. These ifKluded o su.vey^f pTact^es?n 
progrom evduotion and o codificotlon of progrom evoluZn 
ond .ssu«^ Perhaps the most generally useful of the product ° the ooe 
mentioned Encyelop«JiQ of Educatloftol Evoluotion 

° °^ ^^P^rience in progrom evoluotion In one 

dcTomn r^^'^iy^^^^^^i^oted because there is^ specific • porry l.ne' no 
fo^dXr^nT^ '° responses. It remains quite potsiWe 

for different progrom evoluotors ot ETS to recommend differ^tly deSoned 
evoluotions for the some burgeoning or existing programs 

There is rjo sure Knowledge where the profession of progrom evoluo- 

exp^nl^re'amo^^^^^^^ h'^-'^^^ '^^f '"9- program e?rat"n':::?,l 
expenence omozing growth over the next decode, growth thoi will dworf 
Its current stotus (wh^h olreody dwarfs its status of a decode ogoT Or oer 
IZ^Tr sclen?if?cTeihn ques 

Sconon'tt'lrt rK°i'^-^°^*"°^^<l°^^no of program development^ond 
ustificotion. At ETS, the consensus is that continued growth is the more llke- 

exp^" e lis hoo.f ' ^^"^^ ^°"^°L^ bochgrou'nds and accum:^ i.g 
e^glJfgp" fester " ^'"^ contributions to th,^ 
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Dc«€rlptiofis of ETS Studies in Som* K^y CQt*gofI*s 
A«sth«tics end Creativity In Educotion 

For [Joftlett Hoyes Ill's pfogrom of Edycotion through Vision at Andover 
Acodemy, Donald A. Trismen developed o bottery of evoluotion instru- 
ments that assessed, inter olio, o voriety of aesthetic judgments. Other ETS 
staff members working in this oreo hove induded Normon Frederlksen and 
William C. Word, who hove developed o voriety of assessment techniques 
for topping creativity and scientific creativity; Richord T. Murphy, v/ho also 
has developed creotivity-ossessing techniques; and Scofvio D. Anderson, 
who described, in I96fl, o voriety of woys to assess the effectiveness of 
aesthetic displays. 

DiiinguQl Education 

ETS staff hove conducted othJ ossisted in evaluations of numerous and var- 
ied progroms of bilingual ed jcotion. For exomple, Berkeley office stoff 
hove evoluoted prc^roms In Colexico (Reginald A. Ccrder, Jr.), Hodendo- 
Lo Puente (Potricio Ellos, Patricio Wheeler), and El Monie CCorder, S. 
Johnson). For the Los Angeles office, J. fXichord Harsh evaluated a bilinguol 
program in Azuso. and Ivor Thomos evoluat«j one in Fountoin Volley. 
Donald E. Hood of the Austin office evoluoted the Donos f^linguol Multicul- 
tural Program. These evoluotions were voriously formative and summotlve, 
and covered bilingual programs that, in combinotion, served students from 
preschool (Fountoin Volley) through 1 2th grode (Colexico). 

CQmpin9 Progroms 

Those in chorge of o school compirjg progrom in ; '^'ofk City felt thot It 
wos having unusuol and positive effects on the students, t-pecioily in terms 
of motivation. ETS was osked to— and did— evoluote this program, using 
an innovotive design ond meoSurement procedwes developed by Roy- 
mond G . Wosdyke and Jerilee Grondy . 

CarMr Education 

In this decode of heavy federol emphosis on career educotion, ETS hos 
been— and is— involved in the evoluotion of numerous progroms in that 
field. For Instonce, Roymond G. Wosdyke helped the Newark, Delowore, 
school system determine whether its coreer education goals ond progroms 
were properly meshed. In Dollos, Donald Hood of the ETS reglonol stoff 
ossisted In developing goal spedficotlons ond r^ewlng evoluotion test 
items for the Sk^lne Project, o pjerfprmonce controct colling for the training 
of high sd-fwol students m 12, coreer dusters, Nofmcn E. Freeberg devel- 
oped o test botte^ to be, used In evoluoting the N^ghborhood Youth 
Corps, Ivw Thomos tiie Los Angeles office provided formative evoluotion 
services for the Azuso Unified School Dlstrla's ^0&^ grqde coreer training 
ond performonce progrom for disodvomoged siudents.= Roy Mdrdy of the 
Atlonto t^ce directed the third-party evxsluation of FJorldo's Compr^en- 
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1!^^!?^'°^, Vocotionol Education for Coreer Development and 
Vasdyhe evaluated the Mo/yiand Career Informotion System Reginold A 
Cofder. Jf . of the Berheley offic#» oss.sted in the evoluation of the California 
career Education program and subsequently directed the evoluotion of the 
Expenence-Dosed Career Educotion Models of a number of reqionol educa- 
tion labofatones. 



Computer-aided instruction 

Three major ccmputer-cided instruction progroms developed for use in 
schools and colleges hove been evoluoted by ET5. The most ambitious is 
PLATO from the University of Illinois, Initiolly, the US evoluotion wos di- 
rected by Ernest Anostosic. but later the effort was divided between 
aichord T, Murphy, who focused on college-level programs in PUTO and 
Spencer Swinton and Morionnc Amorel, who focused on elementary and 
secondory school programs, ETS oiso directed the evaluation of TICCIT on 
instructionol progrom for junior colleges thot uses smoll-comp- iter technolo- 
gy, the study was conducted by Donald L. Alderman. Currently. Mariorie 
RegosfQ is directing the evoluotion of the first major in-school longiiudinol 
demons aotion of computer-oided instruction for low-income students. 

Drug Progroms 

Robert F Doldt sen/ed as c consultant on the Notional Acodemy of Sci- 
ences study assessing the effectiveness of drug ontagonists (less harmful 
drugs thot will fight the impact of illegol drugs). Somuel Doll served on o 
notional Academy of Science panel thot designed, for the National Insti- 
tutes of Heolth. o means of evaluating medio dnjg informotion proqrams 
and spot odvertisements. 



EducQtionol Television 

ETS wos responsible for the notional summotive evaluation of the ETV series 
Sesame Street, for preschoolers, and The Elect/lc Compony. for reading 
students in grades one through four; the principal evoiuators were Samuel 
Boll. Gerry Ann Dogotz. ond Donoid B. Rubin. Additionally, Ronold Rougher 
ond Joan Knapp evoluoted the series Bread ond Outterfiies to clarify career 
choice, Joyjio Hsio evoluoted o series on the teaching of English for high 
school students and o seiies on parenting for odults. 



Higher Educotion 

Much ETS reseorch in higher education focuses on evaluating students or 
teachers, rather then programs, mirroring thefoct that systemotic program 
evaluation is not common at this level. ETS has mode, however, at leost 
two major forays into program evoluotion In higher educotion In their 
Open University study. Rodney T. Hortnett ond ossociotes joined with three 
Americon universities (Houston, Moryland, ond Rutgers) to see if the British 
Open Univefsity's methods and moteriols were oppropriote for American 
institutions. Mary Jo Clorl<, Leonard L. Boird, and Hortnett conducted o study 
of means of ossessing quollty in doctoral progroffls. They estobiished on or- 
roy of criterio for use in obiQlnlng more precise descriptions^ evoluations 
of. doctoral progroms than the pfevoiling technique— reputotlonol surveys 
—provides. 



f 9 



Pf#schooi Proff cmts 

A number of preschool pfogroms hove been evoluoted by ETS stoff . inciud- 
ing the ETV series S«Qm# Strt#t. living SJgel conducted fof motive studies 
of jevelopmentc! curricutum, Virginio Shipman helped the Dell Telephone 
Companies evaluate their doy core centers, ond Somuel Boll and Brent 
Bfidgemon provided the U,S. Office of Child Development with o sophisti- 
coied design for the evoluotton of Porenr-Child Development Centers. 

Priion Progroim 

In* New Jersey. ETS has been involved In the evoluoticjn of educotionol pro- 
grom!i for pnsoners. Developed ond administered by Mercer County Com- 
munity CoHege, the progroms hove been subject to ongoing study by 
f\ono!d L. Rougher ond Somuel Dornett. 

R#odin9 Progi oms 

ETS evoluotors hove been involved in a variety of woys in o variety of pro- 
grams ond proposed programs in reoding. For exomple, in on extensive, 
tofge'scoie. notional evaluotion completed in 1976, Donold A, Trismen 
studied the effectiveness of reoding instruction in compensatory programs. 
At the some time. Donald E. Powers conducted o smoll study of the impoct 
of Q locol reading program in Trenton, New Jersey. Ann M. Dussis, Edward 
A. Chittenden, ond Morionne Amorel in 1976. reported the results of their 
study of primoiy school teochers perceptions of their own teoching behov- 
lor Eorfier. Wchord T. Murphy surveyed the reoding competencies and 
needs of the odult populotion. 

Speciot Educof ion 

Samuel Ckall and Korlo Goldman conducted on evaluation of the lorgest pri- 
vate school for the leorning disabled in New York City, ond Carol Vole of the 
ETS office > in Berkeley directed o notional needs assessment concerning 
educQtiono! technology and special education, Poul Compbell is directing 
Q major study of on intervention program for leorning disobk^l Juvenile de- 
linquents 
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